{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 使能图算融合\n",
    "\n",
    "## 概述\n",
    "\n",
    "图算融合是MindSpore特有的网络性能优化技术。它可以通过自动分析和优化现有网络计算图逻辑，并结合目标硬件能力，对计算图进行计算化简和替代、算子拆分和融合、算子特例化编译等优化，以提升设备计算资源利用率，实现对网络性能的整体优化。相比传统优化技术，图算融合具有多算子跨边界联合优化、与算子编译跨层协同、基于Polyhedral的算子即时编译等独特优势。另外，图算融合只需要用户打开对应配置后，整个优化过程即可自动完成，不需要网络开发人员进行其它额外感知，使得用户可以聚焦网络算法实现。\n",
    "\n",
    "图算融合的适用场景包括：\n",
    "\n",
    "- 对网络执行时间具有较高性能要求的场景；\n",
    "- 通过拼接基本算子实现自定义组合算子，并希望对这些基本算子进行自动融合，以提升自定义组合算子性能的场景。\n",
    "\n",
    "接下来，以自定义组合算子开启图算融合为例来体验使能图算融合。\n",
    "\n",
    "> 本文档适用于GPU环境。\n",
    "\n",
    "## 整体流程\n",
    "\n",
    "1. 准备环节。导入公共模块。\n",
    "2. 构造简单`MyNet`网络。对比算子融合前后计算图。\n",
    "3. 自定义组合算子。构造一个简单网络`MyNet`和自定义算子`MyOp`，对比算子融合前后计算图。\n",
    "\n",
    "## 准备环节\n",
    "\n",
    "导入执行以下代码导入所需模块。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import mindspore.context as context\n",
    "from mindspore import Tensor\n",
    "from mindspore.nn import Cell\n",
    "import mindspore.ops as ops"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 构造简单`MyNet`网络\n",
    "\n",
    "当前图算融合优化默认为关闭状态，我们只需在训练脚本中为`context`指定参数`enable_graph_kernel=True`即可启用图算融合。\n",
    "\n",
    "为了说明图算融合优化场景，构造了一个简单网络`MyNet`, 包含一个乘法和加法计算。在打开图算融合进行优化之后，这两个计算便会自动合成一个融合算子。\n",
    "\n",
    "为了对比开启图算融合前后计算图的差异，分别执行以下两段代码，记录两次计算的计算图。其中`graphs_path1`和`graphs_path2`分别为开启图算融合前后进行计算保存的计算图路径。\n",
    "\n",
    "1. 关闭图算融合时进行计算，设置`enable_graph_kernel=False`："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "result: [[2. 2. 2. 2.]\n",
      " [2. 2. 2. 2.]\n",
      " [2. 2. 2. 2.]\n",
      " [2. 2. 2. 2.]]\n"
     ]
    }
   ],
   "source": [
    "graphs_path1 = \"./log/enable_graph_kernel_fusion/graph1\"\n",
    "!rm -rf $graphs_path1\n",
    "\n",
    "context.set_context(mode=context.GRAPH_MODE, device_target=\"GPU\")\n",
    "# save graph ir to view fusion detail.\n",
    "context.set_context(save_graphs=True, save_graphs_path=graphs_path1)\n",
    "# enable graph kernel optimization.\n",
    "context.set_context(enable_graph_kernel=False)\n",
    "\n",
    "class MyNet(Cell):\n",
    "    def __init__(self):\n",
    "        super(MyNet, self).__init__()\n",
    "        self.add = ops.TensorAdd()\n",
    "        self.mul = ops.Mul()\n",
    "\n",
    "    def construct(self, x):\n",
    "        a = self.mul(x, 2.0)\n",
    "        res = self.add(a, 1.0)\n",
    "        return res\n",
    "\n",
    "x = np.ones((4, 4)).astype(np.float32) * 0.5\n",
    "net = MyNet()\n",
    "result = net(Tensor(x))\n",
    "print(\"result: {}\".format(result))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2. 开启图算融合时进行计算，设置`enable_graph_kernel=True`："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "result: [[2. 2. 2. 2.]\n",
      " [2. 2. 2. 2.]\n",
      " [2. 2. 2. 2.]\n",
      " [2. 2. 2. 2.]]\n"
     ]
    }
   ],
   "source": [
    "graphs_path2 = \"./log/enable_graph_kernel_fusion/graph2\"\n",
    "!rm -rf $graphs_path2\n",
    "\n",
    "context.set_context(mode=context.GRAPH_MODE, device_target=\"GPU\")\n",
    "# save graph ir to view fusion detail.\n",
    "context.set_context(save_graphs=True, save_graphs_path=graphs_path2)\n",
    "# enable graph kernel optimization.\n",
    "context.set_context(enable_graph_kernel=True)\n",
    "\n",
    "net = MyNet()\n",
    "result = net(Tensor(x))\n",
    "print(\"result: {}\".format(result))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3. 查看计算图\n",
    "\n",
    "    在当前工作目录下执行`mindinsight start --summary-base-dir ./log/enable_graph_kernel_fusion`，其中`./log/enable_graph_kernel_fusion`为保存的所有计算图的主目录。启动MindInsight可视化工具查看计算图，训练看板中`graph1`目录保存的为图算融合前的计算图，`graph2`目录保存的为图算融合后的计算图（参考[MindInsight计算图可视化教程](https://www.mindspore.cn/tutorial/training/zh-CN/r1.0/advanced_use/dashboard.html#id5)）。\n",
    "\n",
    "    ![未开启图算融合](https://gitee.com/mindspore/docs/raw/master/tutorials/notebook/enable_graph_kernel_fusion/images/graph1.png)\n",
    "\n",
    "    <center>图1: 未使能图算融合计算图。</center>\n",
    "\n",
    "    ![开启图算融合](https://gitee.com/mindspore/docs/raw/master/tutorials/notebook/enable_graph_kernel_fusion/images/graph2.png)\n",
    "\n",
    "    <center>图2: 开启使能图算融合计算图。</center>\n",
    "\n",
    "    根据该计算图的结果所示，其中图1为未使能图算融合时的对应计算图，图2为使能图算融合后的对应计算图。可以看到该网络中的加法和乘法被融合成一个算子。\n",
    "\n",
    "## 自定义组合算子\n",
    "\n",
    "基于图算融合技术，用户可以很方便地实现高性能的自定义组合算子。其主要流程为：\n",
    "\n",
    "1. 在脚本中用基本算子组合的方式实现自定义算子定义和使用；\n",
    "2. 打开图算融合配置；\n",
    "3. 图算融合对自定义组合算子中的基本算子自动进行算子融合，并生成高性能融合算子。\n",
    "\n",
    "相比其它自定义算子方式，这种方式具有对框架无侵入、简单易用等优点。\n",
    "\n",
    "构造一个简单网络`MyNet`，并在其中使用了自定义算子`MyOp`。分别执行以下两段代码，记录两次计算的计算图。其中`graphs_path3`和`graphs_path4`分别为开启图算融合前后进行计算保存的计算图路径。\n",
    "\n",
    "1. 关闭图算融合时进行计算，设置`enable_graph_kernel=False`："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "result: [[-0.015104 -0.015104 -0.015104 -0.015104]\n",
      " [-0.015104 -0.015104 -0.015104 -0.015104]\n",
      " [-0.015104 -0.015104 -0.015104 -0.015104]\n",
      " [-0.015104 -0.015104 -0.015104 -0.015104]]\n"
     ]
    }
   ],
   "source": [
    "graphs_path3 = \"./log/enable_graph_kernel_fusion/graph3\"\n",
    "!rm -rf $graphs_path3\n",
    "\n",
    "context.set_context(mode=context.GRAPH_MODE, device_target=\"GPU\")\n",
    "# enable graph kernel optimization.\n",
    "context.set_context(save_graphs=True, save_graphs_path=graphs_path3)\n",
    "context.set_context(enable_graph_kernel=False)\n",
    "\n",
    "class MyOp(Cell):\n",
    "    \"\"\" my first custom OP composited by basic OPs \"\"\"\n",
    "    def __init__(self):\n",
    "        super(MyOp, self).__init__()\n",
    "        self.sub = ops.operations.Sub()\n",
    "        self.mul = ops.operations.Mul()\n",
    "\n",
    "    def construct(self, x, y):\n",
    "        a = self.sub(x, y)\n",
    "        return self.mul(a, x)\n",
    "\n",
    "class MyNet(Cell):\n",
    "    def __init__(self):\n",
    "        super(MyNet, self).__init__()\n",
    "        self.mul = ops.operations.Mul()\n",
    "        self.pow = ops.operations.Pow()\n",
    "        self.my_op = MyOp()\n",
    "\n",
    "    def construct(self, x, y):\n",
    "        a = self.mul(x, 2.0)\n",
    "        b = self.pow(a, 3.0)\n",
    "        res = self.my_op(b, y)\n",
    "        return res\n",
    "\n",
    "x = np.ones((4, 4)).astype(np.float32) * 0.2\n",
    "y = np.ones((4, 4)).astype(np.float32) * 0.3\n",
    "net = MyNet()\n",
    "result = net(Tensor(x), Tensor(y))\n",
    "print(\"result: {}\".format(result))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2. 开启图算融合时进行计算，设置`enable_graph_kernel=True`："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "result: [[-0.015104 -0.015104 -0.015104 -0.015104]\n",
      " [-0.015104 -0.015104 -0.015104 -0.015104]\n",
      " [-0.015104 -0.015104 -0.015104 -0.015104]\n",
      " [-0.015104 -0.015104 -0.015104 -0.015104]]\n"
     ]
    }
   ],
   "source": [
    "graphs_path4 = \"./log/enable_graph_kernel_fusion/graph4\"\n",
    "!rm -rf $graphs_path4\n",
    "\n",
    "context.set_context(mode=context.GRAPH_MODE, device_target=\"GPU\")\n",
    "# enable graph kernel optimization.\n",
    "context.set_context(save_graphs=True, save_graphs_path=graphs_path4)\n",
    "context.set_context(enable_graph_kernel=True)\n",
    "\n",
    "net = MyNet()\n",
    "result = net(Tensor(x), Tensor(y))\n",
    "print(\"result: {}\".format(result))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3. 在MindInsight中查看生成的计算图。MindInsight训练看板中`graph3`目录保存的为图算融合前的计算图，`graph4`目录保存的为图算融合后的计算图。\n",
    "\n",
    "    ![未开启图算融合](https://gitee.com/mindspore/docs/raw/master/tutorials/notebook/enable_graph_kernel_fusion/images/graph3.png)\n",
    "\n",
    "    <center>图3: 未使能图算融合计算图。</center>\n",
    "\n",
    "    ![开启图算融合](https://gitee.com/mindspore/docs/raw/master/tutorials/notebook/enable_graph_kernel_fusion/images/graph4.png)\n",
    "\n",
    "    <center>图4: 开启使能图算融合计算图。</center>\n",
    "\n",
    "    根据该计算图的结果所示，其中图3为未使能图算融合时的对应计算图，图4为使能图算融合后的对应计算图。可以看到不仅自定义算子`MyOp`中的基本算子进行了融合，并且与其他算子也进行了更大范围融合。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 总结\n",
    "\n",
    "以上便完成了图算融合的体验过程，我们通过本次体验全面了解了如何开启图算融合模式，理解了如何生成高性能的融合算子。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
